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Abstract 


Staged  circuit  switching  (SCS)  is  a  message-switching 
technique  that  combines  a  new  protocol  with  new  communica¬ 
tion  hardware.  Protocol  and  hardware  are  designed  specifi¬ 
cally  for  networks  that  are  intended  to  function  as 
integrated,  general-purpose  MIMD  machines,  i .a.  for  "network 
computers". 

The  SCS  protocol  is  a  form  of  circuit-switching  that 
degrades  automatically  into  packet-switching  when  unavail¬ 
able  output  lines  make  further  extension  of  a  partial  cir¬ 
cuit  impossible.  The  SCS  hardware  uses  a  front-end  crossbar 
switch  to  multiplex  some  small  number  of  communication  chan¬ 
nels  among  all  of  a  given  node's  incident  links.  Together, 
hardware  and  protocol  represent  an  attempt  to  convert  spare 
bandwidth  into  lower  network  delays.  They  also  allow  exper¬ 
imentation  with  networks  that  reconfigure  themselves  dynami¬ 
cally  in  response  to  meesured  traffic.  We  compare  SCS  to 
packet-switching,  circuit  switching  and  the  "virtual  cut- 
through"  protocol  of  Kermani  and  Kleinrock,  and  discuss  an 
SCS  implementation  designed  for  the  SBN  network  computer. 
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IIINCUisSiiin: 


Staged  circuit  switching  (SCS)  is  a  message-switching  technique  that 
combines  a  new  protocol  with  new  communication  hardware.  The  protocol  and 
hardware  proposals  are  independent  to  the  extent  that  the  protocol  might 
be  implemented  on  different  hardware  and  the  hardware  has  properties  that 
are  potentially  of  interest  regardless  of  protocol.  Protocol  and  hardware 
were,  however,  designed  together  and  complement  each  other;  they  wiLl  be 
presented  here  as  two  aspects  of  a  single  design. 

Both  the  SCS  protocol  and  the  SCS  architecture  are  applicable  in 
theory  to  any  computer  network.  In  practice,  however,  they  are  well-suited 
to  a  specific  sub-class  of  network,  the  "network  computer"  sub-class.  A 
network  computer  is  a  network  of  processor  nodes  that  is  intended  to  func¬ 
tion  not  as  a  collection  of  autonomous  hosts  but  as  a  single  MIMG  machine. 
Network  computers  are  designed  to  support  experiments  in  asynchronous  dis¬ 
tributed  programming.  As  the  cost  of  microprocessor  nodes  falls,  such 
machines  have  become  increasingly  interesting.  Intuition  insists  that 
there  must  be  some  way  of  combining  many  cheap,  modest ly-powerfu l  proces¬ 
sors  into  a  single  hi gh ly-powerfu L  machine.  Many  problems  remain  though, 
and  the  design  of  effective  network-computer  communication  protocols  and 
hardware  is  an  important  one.  SCS  is  an  approach  to  this  problem. 

The  SCS  protocol  combines  aspects  of  packet  switching  and  of  conven¬ 
tional  circuit  switching.  Packet  switching  is  used  loosely  here  to  refer 
to  any  store-and-forward  protocol  in  which  data  is  copied  into  holding 
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buffers  at  each  intermediate  node  along  some  path  from  source  to  destina¬ 
tion.  No  assumption  is  made  about  packet  size;  packets  may  conceivably  be 
large  enough  to  encompass  any  single  message.  Circuit  switching  refers  to 
a  class  of  protocols  in  which  a  communicating  source  s  and  destination  d 
first  construct  a  dedicated  path  or  circuit  from  s  to  d  and  then  communi¬ 
cate  directly  over  this  path.  In  time-switched  circuit  switching,  the 
dedicated  path  consists  of  reserved  input  and  output  slots  in  each  time- 
division  multiplexing  switch  along  the  circuit's  route.  In  space-switched 
circuit  switching,  the  dedicated  path  is  a  physical  connection  between 
source  and  destination.  The  SCS  protocol  is  strongly  related  to  space- 
switched  circuit  switching. 

Also  related  to  SCS  is  a  proposal  calLed  "virtual  cut-through"  dis¬ 
cussed  by  Kermani  and  <lei nrocktl ,2] .  In  virtual  cut-through,  intermedi¬ 
ate  nodes  along  a  message  path  attempt  to  send  a  message  onward  as  soon  as 
an  appropriate  output  link  has  been  determined.  If  the  appropriate  output 
chann  l  is  free  the  attempt  succeeds;  output  and  input  continue  in  paral¬ 
lel,  with  the  initial  portion  of  the  message  being  transmitted  while  the 
final  portion  is  being  received.  Otherwise  the  message  is  accepted  and 
buffered  as  per  normal  store-and-forwerd  procedure.  Virtual  cut-through, 
then,  attempts  to  pipeline  a  message  through  the  network  at  a  grain  size 
determined  by  the  time  required  for  routing  at  each  intermediate  node. 

SCS  is  compared  in  the  following  to  packet  switching,  circuit  switch¬ 
ing  and  virtual  cut-through.  Proper  comparisons  between  SCS  and  the  other 


three  ere,  however,  problematic.  The  others  were  developed  for  large  net¬ 
works,  SCS  for  a  network  computer.  On  large  networks,  the  bandwidth  and 
the  cost  of  internode  lines  are  expected  to  dominate  the  bandwidth  and 
cost  of  the  processors  or  i/o  channels  that  drive  them.  Long  communication 
links  between  distant  nodes  are  usualLy  far  slower  than  node-internal 
memory  busses.  On  a  (physically-localized}  network  computer,  on  the  other 
hand,  essentially  the  reverse  holds.  Data  rates  usually  depend  on  the 
speed  of  the  communication  processor  or  i/o  channel,  and  the  cost  of 
internode  lines  is  trivial.  We  will  keep  these  fundamentally  different 
assumptions  in  mind  as  we  discuss  SCS. 

Section  2  describes  the  SCS  protocol  and  section  3,  the  SCS  architec¬ 
ture.  Section  4  discusses  the  SCS  protocoL  and  reLates  it  to  other  proto¬ 
cols.  Section  5  disci»'-,-'’S  dynamic  network  reconfiguration. 

SCS  has  been  developed  in  the  context  of  a  network-computer  project 
called  SBN,  for  Stony  Brook  Network.  In  a  small  prototype  implementation 
of  SBN,  communication  takes  place  via  conventional  packet-switching  over 
word-parallel,  point-to-point  links.  The  design  calls  for  the  network  to 
be  configured  in  a  torus — a  square  grid  in  which  each  row  and  each  column 
loop  back  on  itself.  The  torus  topology  is  integral  to  SBN's  design,  but 
the  prototype  hardware  will  soon  be  replaced.  The  implementation  of  SCS 
designed  for  a  new  version  of  SBN  is  outlined  in  Section  6.  (The  SBN  pro¬ 
ject  itself  is  described  by  Gelernter  and  Bernstein(3]  and  Gelernter(4] . ] 
Finally  in  Section  7  we  discuss  related  work  in  network-computer  comrnuni- 
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cation  protocol  and  hardware  design. 

£.  The  SCS  orotocoL 

In  SCS,  a  source  node  first  transmits  the  header  of  a  message  M  and 
then  awaits  an  acknowledgement  before  transmitting  M's  data  portion.  As 
the  header  arrives  at  each  intermediate  node  along  its  path,  that  inter¬ 
mediate  node  attempts,  using  a  programmable  crossbar  switch,  to  establish 
a  direct  physical  connection  between  M's  input  link  and  an  appropriate 
output  link.  If  the  appropriate  output  Link  is  not  in  use,  and  the  node 
at  the  other  end  of  the  output  link  is  in  interrupts-enab led  state-— i.e., 
able  to  accept  M's  header  and  process  it  immediately —  then  the  attempt 
succeeds  and  the  next  node  along  M's  path  examines  the  header.  Ultimately 
the  header  reaches  either  a  node  at  which  the  attempt  fails  or  a  node 
which  is  M's  final  destination.  In  either  case,  a  dynamically-configured 
hardware  path  has  been  established  between  source-node  s  and  destination- 
node  d.  This  path  is  used  for  transmitting  an  acknowledgement  from  d  to  s 
(indicating  that  data  transmission  may  proceed)  and  then  for  transmitting 
M's  data  portion  from  s  to  d.  A  final  acknowledgement  from  d  to  s  indi¬ 
cates  sucessful  or  unsuccesful  receipt. If  d  is  M's  final  destination  the 
process  is  complete;  if  it  is  not,  M  is  accepted  and  buffered  in  the 
intermediate  node  for  later  forwarding  in  the  same  fashion.  If,  in  the 
default  case,  each  attempt  to  connect  input  to  output  link  along  M's  path 
fails,  then  M  progresses  through  the  network  precisely  as  it  would  under 
pure  store-and-forwarding .  On  the  other  hand,  if  each 
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succeeds ,  then  a  direct  physical  connection  between  M's  source  and  desti— 
nation  has  been  established  precisely  as  in  pure  (space-switched] 
circuit-switching  protocols.  In  the  intermediate  case,  M  may  pass  several 
times  between  packet-  and  circuit-switched  mode  as  it  travels  from  source 
to  destination. 

The  SCS  protocol  offers  potential  advantages  as  against  both  store- 
and-  forwarding  and  conventional  circuit  switching. 

Against  conventional  circuit  switching.  SCS's  potential  advantages 
are  (i)  SCS  is  non-blocking.  When  path  construction  meets  a  roadblock  in 
the  form  of  a  non-interruptable  neighbor  or  unavailable  output  lines,  it 
is  necessary  neither  to  abort  and  re-schedule  the  transmission,  nor  to 
hold  unused  lines  until  the  circuit  can  be  completed.  Transmission  simply 
continues  in  packet-  rather  than  circuit-switched  mode.  SCS  shares  the 
following  two  advantages  with  certain  other  circuit-switched  or  hybrid 
protocols:  (ii)  Distributed  control.  No  appeal  to  a  central  route  manager 
is  required  in  order  to  establish  a  circuit.  (iii)  Flexibility.  Short 
messages  are  more  efficiently  packet-switched,  long  messages  circuit- 
switched.  SCS  can  distinguish  the  two  cases  and  treat  each  appropriately. 

Against  stora-and-forwa rd  swi tchi no .  SCS  shares  the  advantages  of 
circuit-switched  systems  generally.  If  the  SCS  protocol  succeeds  in 
establishing  an  N-node  path,  the  data  portion  of  a  message  that  would  have 
been  recopied  N  times  under  store-and-forwerding  is  copied  only  once, 
directly  from  source  to  destination.  The  message  arrives  faster  and 
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computation  resources  network-wide  are  conserved,  insofar  as  intermediate 
nodes  along  ths  message  path  are  not  required  to  input,  buffer  and  output 
the  data  portion  of  the  message. 

Disadvantages  of  SCS,  and  problems  for  ongoing  study,  invoLve  the 
effects  of  communication  bandwidth  Lost  as  a  header  propagates  down  a 
path,  building  a  circuit.  We  discuss  advantages  and  disadvantages  of  SCS 
and  compare  it  to  virtual  cut-through  below. 

3  .  The  SCS  Architecture 

The  SCS  architecture  is  shown  in  a  highly-simplified  schematic  in 
figure  1.  Each  node  n  is  provided  with  a  front-end  containing  a  programm¬ 
able  crossbar  switch  c.  Line  is  connected  to  the  front-end  crossbar  of 
n’s  first  neighbor,  lg  to  its  second  neighbor  and  so  on.  The  switch 
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allows  any  line  to  be  connected  to  any  other  line  or  to  the  DMA  link  d. 
The  figure  assumes  4  nearest  neighbors,  as  on  SBN,  but  the  general  scheme 
makes  no  assumptions  about  number  of  neighbors.  Note  that  in  the  5x5 
switch  shown,  two  connections  may  be  maintained  through  the  switch 
simultaneously — e.g.l^  may  be  connected  to  lg,  and  l^  simultaneously  con¬ 
nected  to  d. 

The  communication  kernel  runs  either  on  the  host  or  on  a  dedicated 
front-end  processor  at  each  node.  In  the  first  case,  the  host  multiplexes 
communication  and  computation.  In  the  second  and  more  likely  case,  the 
dedicated  front-end  interfaces  to  a  host  over  a  shared  bus  or  a  second  DMA 
channel. 

In  figure  1,  each  node  interfaces  to  the  net  over  a  single  DMA  chan¬ 
nel.  This  would  be  unacceptable  on  large  networks,  where  the  bandwidth  of 
communication  lines  is  typically  small  relative  to  node-internal  memory 
bandwidth.  But  it  is  likely  that,  on  a  fully-developed  network  computer, 
communication-line  bandwidth  will  approach  memory  bandwidth  more  closely, 
and  the  utility  of  multiple  DMA  channels  (each  contending  for  memory  bus 
cycles)  diminishes  as  line  speed  approaches  bus  speed.  Nevertheless,  the 
SCS  architecture  makes  no  assumptions  about  the  number  of  DMA  channels 
connecting  a  node  to  its  switch.  The  allowance  it  makes  for  channels- 
per-node  to  be  determined  independently  of  the  number  of  lines  to  adjacent 
switches  is  its  fundamental  characteristic.  The  number  of  channels  per 
node  is  optimized  to  node-internal  bus  bandwidth.  The  number  of  Lines  to 
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adjacent  switches — i.e.,  neighbors-per-node — is  determined  by  network 
topology.  It  makes  no  sense  for  channels  to  exceed  neighbors,  but  there 
are  many  cases  in  which  neighbors  might  exceed  channels. 

The  SCS  architecture  supports  the  SCS  protocol  directly.  In  addi¬ 
tion,  because  it  allows  number-of-channels  and  number-of-neighbors  to  be 
determined  independently,  it  might  well  allow  construction  of  more 
densely-connected  networks  than  conventional  architectures  do — 
"conventional  architectures"  being  those  in  which  each  node  is  required  to 
have  as  many  channels  as  it  has  neighbors.  Increasing  the  connectivity  of 
an  SCS  network  requires  that  the  complexity  of  the  crossbars  be  increased 
and  that  passive  inter-switch  lines  be  added — but  does  not  require  the 
addition  of  communication  channels. 

Highly-connected  network  graphs  are  desirable  in  network-computer 
architecture  in  order  to  maximize  available  ccnn uni cli on  capacity  and 
minimize  network  diameter.  Diameter  in  particular  is  a  centraL  concern  in 
the  design  of  large  networks,  and  topologies  such  as  the  binary  hypercube 
provide  log-growing  diameter  in  exchange  for  a  log-growing  [potentially 
large)  number  of  neighbors  per  node.  The  SCS  protocol  itself  provides 
further  incentive  for  the  use  of  dense  toplogies;  these  points  are  pursued 


below . 
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4.  Pi scussion 

As  noted  above,  comparison  between  SCS  and  Large-network  protocols  is 
difficult  because  of  the  fundamentally  different  hardware  assumptions 
involved.  We  will  nonetheless  compare  in  general  terms  the  behavior  of 
packet  switching,  virtual  cut-through  and  SCS  under  similar  circumstances. 
Because  we  will  assume  for  illustrative  purposes  that  SCS  succeeds  in 
building  a  circuit  from  source  through  to  destination,  its  behavior  is 
identical  to  the  behavior  of  circuit-switching  protocols. 

We  assume  a  network  computer;  we  therefore  assume  that  propagation 
delays  are  negligible.  Suppose  that  the  nodes  of  a  communication  subnet 
require  h  time-units  to  transmit  or  receive  a  packet  header,  d  time-units 
to  transmit  or  receive  the  data  portion  of  a  packet  and  p  time-units  to 
perform  whatever  processing  is  necessary  to  determine  where  a  newly- 
received  packet  goes  next.  For  present  purposes  we  will  assume  h,  d  and  p 
to  be  identical  under  each  of  the  three  protocols  to  be  compared.  (This 
assumption  will  cause  SCS  to  be  underrated,  as  we  discuss  below.) 

Consider  a  packet  that  follows  a  three-hop  path  from  source  node  1 
through  intermediate  nodes  2  and  3  to  destination  node  4.  Assuming  that 
virtual  cut-through  successfully  cuts  through  nodes  2  and  3  and  that  SCS 
builds  a  complete  circuit  from  node  ^  to  node  4,  the  behavior  of  the  three 
protocols  is  graphed  in  figurs  2.  (The  figure  assumes  the  simplest  possi¬ 
ble  store-and-forward  protocol,  one  in  which  message  reception  end  header 
processing  occur  serially.)  Let  t^p  be  the  time  required  by  the  store- 


p  =  time  to  route  and  process  s  header 

h  =  time  to  transmit  or  receive  a  header 

d  =  time  to  transmit  or  receive  the  data  portion 

of  a  message 

SF  =  store-and-forward;  VC  =  virtual  cut  through 
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and-forward  protocol  to  handle  the  packet  from  transmission  of  the  first 
byte  by  the  source  to  reception  cf  the  last  byte  by  the  destination. 
(Note  that  reception  by  the  destination  is  not  complete  until  the  destina¬ 
tion  realizes  that  it  is  in  fact  the  destination.)  tyg  is  likewise  the 
time  required  by  virtual  cut-through,  and  tg^g  the  time  required  by  SCS. 
If  j  is  the  length  of  the  path  in  hops,  then  it  is  clesr  from  the  figure 
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(generalizing  from  3  to  j  hops]  that 
tSF  =  j (h+p)  +  jd 

'scs  =  jth+pl  +  d 

tVc  =  j  th-4-p)  +  (d-p) 

SCS  and  virtual  cut-through  are  faster  than  store-and-forward  by  a  factor 
proportional  to  the  length  of  the  path.  SCS  and  virtual  cut-through  are 
equally  fast  within  a  factor  of  one  packet-processing  delay.  Virtual  cut- 
through  is  faster  by  one  processing  delay  because  it  processes  and 
receives  packets  simultaneously. 

In  terms  of  network  delay,  then,  SCS  and  virtual  cut-through  are  com¬ 
parable  within  the  broad  terms  of  this  comparison.  Both  are  ordinarily 
superior  to  store-and-forward;  in  the  worst  case,  where  virtual  cut- 
through  is  unable  to  perform  cut-throughs  and  SCS  unable  to  build  circuits 
of  length  greater  than  one  hop,  both  are  essentially  identical  to  store- 
and-forward. 

This  comparison  however  addresses  transit  times  only,  not  throughput. 
Regarding  throughput  SCS  is  at  a  disadvantage:  communication  lines  that 
have  been  incorporated  into  a  circuit  are  held  idle  as  the  header  pro¬ 
pagates  forward.  Lines  are  never  held  idle  in  store-and-forw8rd  or  virtual 
cut-through.  Note  however  thet  SCS  is  a  self-limiting  protocol.  Long 
circuits  are  constructed  only  when  idle  bandwidth  is  available.  When 
bandwidth  is  in  short  supply,  circuits  are  blocked  early  and  bandwidth- 
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loss  is  correspondingly  small.  SCS  is  in  this  sense  a  "greedy  algorithm" 
that  siezee  the  largest  chunk  of  bandwidth  available  at  any  given  time  and 
converts  it  into  lower  network  delay.  Analytical  and  simulation  studies 
now  in  progress  will  measure  this  feedback  effect  and  determine  to  what 
extent  it  prevents  excessive  bandwidth-consumption. 

There  are  many  unanswered  questions  regarding  the  behavior  of  SCS — 
despite  which  it  is  SCS  and  not  virtual  cut-through  that  is  of  interest  in 
our  network  computer  context  for  tactical  reasons  involving  simplicity  and 
data  rates,  and  strategic  reasons  involving  flexibility.  We  discuss  tacti¬ 
cal  points  directly  below  and  strategic  issues  in  the  next  section. 

It  appears  that  SCS  will  be  considerably  simpler  to  implement  within 
the  constraints  of  a  network-computer  environment  than  would  virtual  cut- 
through.  Virtual  cut-through  appears  to  require  either  independent  GKA 
channels  for  each  link  or  a  dedicated  communication  processor  that  imple¬ 
ments  all  DMA  channels  and  interfaces  to  a  routing-and-contro l  processor. 
SCS,  on  the  other  hand,  allows  a  single  DMA.  channel  to  be  multiplexed 
among  ell  links  via  the  passive  switch. 

Closely  related  to  the  foregoing  is  the  issue  of  maximum  data-rates 
supportable  under  the  two  protocols.  Virtual  cut-through  requires  that 
two  channels  access  one  message  buffer  simultaneously;  the  maximum  sup¬ 
portable  data-rate  is  therefore  roughly  one-half  the  bandwidth  of  the  bus 
over  which  message  buffers  are  accessed.  In  SCS,  on  the  other  hand,  the 
source  message  buffer  is  emptied  directly  into  the  destination  buffer. 


Maximum  bandwidth  is  therefore  roughly  equal  to  the  bandwidth  of  node- 
internal  memory  busses.  It  follows  that  assuming  h,  p  and  d  to  be  identi¬ 
cal  under  virtual  cut-through  and  SCS  is  unfair  to  SCS.  In  the  best  case, 


hscs  and  dg^g  are  each  not  much  more  than  half  hyC  and  dyg,  making  SCS 
substantially  faster  than  virtual  cut-through. 


5.  Dynamic  reconfiguration 


If,  in  an  SCS  system,  a  message  intended  for  transmission  over  a 
multi-hop  path  were  to  find  a  direct  connection  already  in  place  between 
its  source  and  its  destination,  then  header  propagation  time  is  eliminated 
and  transit  time  is  shorter  than  in  either  virtual  cut-through  or  standard 
SCS.  SCS,  then,  encourages  experimentation  with  networks  that  reconfigure 
themselves  dynamically  in  response  to  measured  traffic. 

Consider  the  SBN  torus  first.  A  node  N  that  establishes  and  removes 
a  given  connection  more  than  j  times  within  some  designated  period  might 
conclude  that  traffic  over  the  path  of  which  that  connection  is  part  is 
sufficiently  heavy  to  warrant  the  connection's  being  left  in  place  for 
some  longer  period.  Throughout  this  designated  longer  period  N  ignores 
the  path  break-down  interrupts  that  ordinarily  notify  all  intermediate 
nodes  to  disconnect  a  path  at  the  end  of  a  given  transmission.  The  long¬ 
term  connection  is  transparent  to  the  source-destination  pairs  whose  com¬ 
munication  paths  include  it;  all  such  communicating  pairs  are  in  effect, 
brought  one  hop  closer  together.  When  the  designated  period  is  over,  M 
responds  tcs  the  next  path  break-down  interrupt  by  disconnecting  the  path. 
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In  this  methodology  for  dynamic  reconfiguration,  the  communication 
kernel's  decision  to  leave  a  switch  connection  in  place  is  broadly  analo¬ 
gous  to  a  compiler's  decision  to  store  a  variable  in  a  register.  Whether 
to  speed  communication  in  the  fir3t  case  or  computation  in  the  second, 
steps  are  involved  thet  must  be  taken  explicitly  [leaving  the  connection 
or  loading  the  register]  and  undone  explicitly  (breaking  the  connection  or 
reloading  the  register].  Registers  and  long-term  switch  connections  are 
scarce  resources  that  must  be  allocated  by  carefully-designed  algorithms 
or  heuristics. 


As  noted  above,  the  SCS  architecture  may,  however,  allow  construction 
of  networks  that  are  considerably  denser  than  SBN  with  its  degree-four 
nodes.  Dense  networks  may  be  configured  in  such  a  way  as  to  minimize 
diameters  (in  binary  hypercubes,  for  example]  or,  on  the  other  hand,  in 
such  a  way  as  to  maximize  shortest-route  redundancy.  An  instance  of  this 
second  kind  of  configuration  is  a  square  torus  with  two  links  rather  than 
one  joining  every  pair  of  adjacent  nodes.  The  diameter  of  this  2-link 
torus  is  the  same  as  the  diameter  of  an  ordinary  1-1 . lk-per-adjacent-pai r 
torus.  But  consider  a  pair  of  communicating  nodes  s  and  d  separated  by  i 


horizontal  and  j  vertical  hops:  in  an  ordinary  1-link  torus,  s  and  d  are 
connected  by  (  j  ]  shortest  paths;  in  the  2-link  torus  they  are  connected 


^  2^+^  shortest  paths.  (To  see  this,  note  that  each  of  the  (^] 


shortest  paths  in  the  1-link  torus  is  i+j  hops  long.  In  the  2-link  torus 
an  h-hop  path  exists  in  2n  versions,  each  version  the  result  of  h  sequen¬ 


tial  choices  between  two  possible  links  per  hop.) 
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When  a  sufficiently  large  number  of  acceptably  short  routes  are 
available  on  average  between  source  and  destination,  dynamic  reconfigura¬ 
tion  algorithms  might  make  use  of  the  SCS  crossbar  switches  as  a  distri¬ 
buted  communication  cache  rather  than  as  a  set  of  communication  registers. 
In  this  case,  every  switch  connection  is  Left  in  place  until  the  two  links 
it  joins  are  expressly  required  for  some  other  circuit.  A  cache  hit 
corresponds  to  a  randomly-distributed  source  and  destination  finding  each 
other  adjacent. 

Dynamic  traffic-sensitive  reconfiguration  heuristics  such  as  these 
are  particularly  interesting  in  light  of  the  difficulty  of  what  has  been 
referred  to  as  the  "mapping  problem"  [Bokhari [5] ) .  Network  computers  Like 
SBN  are  designed  to  support  distributed  programs  consisting  of  many 
simultaneous ly-acti ve  modules.  "Mapping  problem"  refers  to  the  task  of 
finding  a  mapping  from  program  modules  to  network  nodes  that  makes  accept¬ 
ably  efficient  use  of  the  network's  limited  communication  resources.  Use¬ 
ful  mapping  heuristics  are  known  for  particular  instances[5]  but  Bokhari 
shows  the  graph  isomorphism  problem,  for  which  no  polynomial-time  solution 
is  known,  to  be  reducible  to  the  most  general  form  of  the  mapping  problem. 
Note  that  the  situation  is  particularly  complex  on  networks  such  as  SBN 
that  are  designed  to  support  a  mix  of  dynamically-loaded  jobs;  only  some 
time-varying  subset  of  nodes  is  free  at  any  given  time.  SCS  makes  it  pos¬ 
sible  to  investigate  this  problem  from  a  different  angle — not  vie  high- 
level  load-time  algorithms  for  configuring  the  program  to  suit  the  system, 
but  vie  low-level  runtime  algorithms  for  configuring  the  system  to  suit 
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th8  program. 

£.  ImplemeO-tati  on 

Thera  follows  a  general  description  of  an  SCS  implementation  designed 
for  SBN.  ArsngotS]  gives  a  detailed  presentation  of  the  hardware  design 
and  the  accompanying  protocol. 

Consider  a  network  in  which  each  node  n.  consists  of  a 
communication-processor  p.  and  an  SCS  front-end  s. .  [Each  node  contains  a 
host-processor  as  well,  but  its  presence  is  irrelevant  in  this  context.] 
Each  s..  is  connected  via  two  physical  lines,  a  serial  data  line  and  a  con¬ 
trol  line,  to  each  of  four  adjacent  SCS  front-ends,  and  by  five  lines,  a 
serial  data  line  and  four  control  lines,  to  the  associated  processor  p.  . 
Each  s.  contains  a  5x5  crossbar  and  a  4x4  crossbar.  The  5x5  crossbar 
interconnects  the  five  data  lines  incident  on  s,.  .  The  4x4  crossbar  inter¬ 
connects  the  4  control  lines  that  terminate  on  adjacent  SCS  front-ends; 
each  of  these  control  lines  is  also  connected  via  a  switchable  tap  to  one 
of  the  control  line  between  s,.  and  p,.  .  This  configuration  is  shown  in  fig¬ 
ure  3 . 

At  network-initialization  time  all  switches  in  all  crossbars  are 

open,  and  control-taps  are  set  such  that  each  node  is  able  to  receive  a 

signal  over  any  control  line  (fig.  4e) .  A  node  n.  wishing  to  establish  a 

path  to  a  neighbor  n.  so  informs  n.  by  pulsing  the  control  line  that  con- 

J  . 

nects  their  respective  SCS  front-ends.  This  pulse  is  referred  to  as  the 


r-‘f  f 


IB 


data  line8  s.<->p.  and  s.<->a.  in  its  SCS  front  end:  n. 

J  J  J  i  i 

data  Linas  s^<->p^  and  s,.<->Sj  in  its  own  SCS  front  end, 
path  between  n.  and  n.  has  been  established  l fig.  4b}. 

*  J 


in  turn  connects 
and  thus  a  data 


If  n.  wishes  to  extend  the  circuit  onward  to  n,  and  the  requisite 
J  * 

<— >n^  lines  are  free,  it  establishes  contact  with  as  described  above, 

then  connects  both  n.<->n.  Lines  (i.e.,  both  the  data  line  and  the  control 

i  J 

line}  with  both  n.<->n.  lines  in  s.  (figure  4c}.  A  data  path  and  a  control 

J  K  J 

path  have  now  been  established  between  n..  and  n^,  and  the  circuit  may  be 
propagated  onward  in  Like  fashion. 


Node  n^.  retains  no  connection  to  the  date-line  component  of  this 
onward-propagating  circuit,  but  it  continues  to  monitor  its  tap  into  the 
circuit's  control-line  component.  Qnce  the  path  is  complete,  the  source 
has  transmitted  the  data  portion  of  its  message  over  the  data-line  com¬ 
ponent  of  the  circuit  and  the  destination  has  acknowledged  receipt,  the 
source  pulses  the  circuit's  control-line  component  twice.  Two  successive 
pulses  are  interpreted  by  all  intermediate  nodes  along  the  path  as  a  path 

break-down  signal;  node  n.  responds  to  this  signal  by  disconnecting  both 

J 

the  control  and  the  data  components  of  the  associated  circuit  in  sj's 
crossbars. 


This  hardware  design  allows  experimentation  with  two  different 
dynamic-reconfiguration  techniques.  In  one,  responsibility  for  leaving  a 
connection  in  place  over  a  term  longer  than  one  message-transmission 
interval  rests  with  the  source  node.  When  the  source  decides  that  a  given 
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circuit  is  valuable  and  should  not  be  torn  down,  it  simply  omits  the  path 
break-down  signal  and  the  path  remains  in  place.  In  the  other,  each  node 
decides  on  its  own  whether,  based  on  observed  pest  demands  made  on  its  SCS 
switches,  a  given  switch  connection  is  likely  to  be  used  again  soon  and 
ought  thus  to  be  retained  for  some  longer  period.  A  node  that  has  decided 
to  retain  a  connection  for  some  longer  period  ignores  all  path  break-down 
signets  pertaining  to  the  connection  for  the  duration  of  the  period. 

7.  Related  work  and  cone Lusi ons 

Surveying  briefly  the  communication  systems  of  operational  network 
computers  in  SBN's  class — the  class  of  general-purpose  MIMD  machines —  we 
note  that  Arachne[7]  uses  store-and-forwarding  over  point-to-point  links, 
Micronet[8]  uses  store-and-forwarding  over  contention  busses,  and  Cm*[9] 
uses  a  hierarchical  bus  to  support  a  network-wide  address  space.  Of 
greater  interest  in  the  present  context  is  the  communication  hardware 
designed  for  the  prospective  X-tree  network  computer  [Sequin[10] )  .  In  X- 
tree  nodes,  each  link  has  an  associated  set  of  hardware  input  and  output 
queues.  All  queues  interface  to  a  common  bus.  Logic  associated  with  each 
link  handles  transmission  and  reception  of  byte-parallel  data  over  that 
link,  and  a  dedicated  routing  processor  switches  bytes  from  input  to  out¬ 
put  queues  over  the  bus.  Communication  in  the  X-tree  system  resembles  vir¬ 
tual  cut-through  insofar  as  messages  are  pipelined  through  the  net  at 
sub-packet  grain  size.  (The  X-tree  communication  protocol  is  not,  however, 
specified  in  detail  by  Sequin[10] . ) 
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We  note  finelly  the  reletion  between  SCS  and  switched  interconnection 
networks.  Switched  interconnection  networks,  as  the  term  will  be  under¬ 
stood  here,  differ  from  the  conventional  network  architectures  assumed 
above  insofar  as  communication  in  such  systems  takes  place  through  a 
potentially  multi-stage  series  of  switches.  Switches  are  not  associated 
with  given  hosts;  they  form  an  independent  network.  Messages  proceed  not 
from  source  node  through  intermediate  processor  nodes  to  destination  node, 
but  from  source  node  through  the  switch  network  to  destination  node.  Com¬ 
munication  may  be  either  packet-  or  circuit-switched  through  the  switch 
net.  When  packet-switching  is  supported,  switches  must  have  associated 
buffers,  and  the  switch  net  becomes  in  affect  an  assembly  of  simple,  de¬ 
localized  front-end  processors.  Note  that,  while  from  the  protocol  point 
of  view  SCS  is  a  midway  between  circuit  and  packet  switching,  from  the 
architectural  point  of  view  SCS,  in  preserving  a  network  of  physical 
switches  but  associating  each  switch  directly  with  a  network  host,  is 
mid-way  between  traditional  network  structure  and  the  switched  intercon¬ 
nection  net. 

Development  of  SCS  is  in  the  preliminary  stages.  Much  work  remains 
to  be  done,  particularly  in  analysis  and  simulation  of  the  SCS  protocol 
(as  noted,  analysis  and  simulations  studies  are  now  underway]  and  in  the 
study  of  dynamic  reconfiguration.  Research  on  these  problems  is  continu¬ 
ing. 
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